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1.  Introduction 


Convergence  theory  for  unconstrained  minimization  centers  around 
proving  which  characteristics  hold  at  points  of  accumulation  of  "mini¬ 
mizing  sequences"  generated  by  unconstrained  minimization  algorithms. 
First  order  convergence  refers  to  proofs  that  accumulation  points  are 
stationary  points.  Second  order  convergence  is  concerned  with  the  addi¬ 
tional  property  that  the  Hessian  matrix  at  a  point  of  accumulation  sat¬ 
isfies  the  second  order  necessary  condition,  namely,  that  the  Hessian 
matrix  there  be  positive  semidefinite.  To  a  great  extent  convergence 
proofs  for  particular  algorithms  have  many  common  elements.  In  this 
paper  these  results  have  been  synthesized  and  put  in  a  general  context. 
Applications  are  given  in  Section  5. 

2.  Step-size  Procedures 

For  the  unconstrained  minimization  problem 
min  f(x) 

s.t.  x  L  H  C  E°  (here  H  is  an  open  set). 


a  general  algorithm  usually  takes  the  following  form. 
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At  iteration  k  ,  given  x^cH  ,  construct  a  directed 

curve  y^(t)  parameterized  by  the  single  variable  t  . 

The  curve  should  have  the  two  properties  that  y^lO)  = 

x^  ,  and  for  t  positive  and  small  fly^lt)]  *'  f(x^)  . 

Use  some  suitable  step  size  procedure  to  obtain  a  value 

t.  and  set 
k 


Xk+1  =  yk(tk* 


Most  of  the  time,  y^ft)  =  xk  +  SkC 
n  x  1  direction  of  search. 


where  s,  is  an 
k 


Below  are  the  five  step  size  procedures  which  will  be  used  at 
different  times  throughout  this  paper.  First,  definitions  are  required. 

A  vector  sfc  is  a  nonascent  direction  at  x^  if 

skVf(xk)  <  0  * 

A  vector  d^  is  a  direction  of  nonpositive  curvature  at  x^  if 

f « • 

A  vecto  s^  is  a  descent  direction  at  x^  if 

s^Vf(x^)  <  0  . 

A  vector  is  a  direction  of  negative  curvature  at  if 

d[v2f(xk)dk  <  0  . 

The  first  three  step  size  procedures  are  called  optimal  step  size 
procedures  and  refer  to  the  following  problem:  given  a  point  x^  i  H  ,  a 

given  open  set,  and  given  a  direction  ,  solve 

minimize  f[x,  +  s  t] 
t  >  0 

subject  to  the  restriction  that 

t  c  {t  >  0  |  x^+  s^t  ■  H  >  . 

-  2  - 


(i) 
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FIRST  LOCAL  MIN1MIZF.R  (SSP  I): 

Set  to  be  the  first  local  minimizer  for  (1). 

GLOBAL  MINIMIZER  (SSP  II): 

Set  to  be  a  global  minimizer  for  (1). 

LOCAL  MINIMIZER  WITH  SMALLER  FUNCTION  VALUE  (SSP  III): 

Set  t^  to  be  any  local  minimizer  for  (1)  with  the  additional 
property  that 

f(xk  +  Vk1  f(xk)  • 

There  are  several  difficulties  associated  with  the  use  of  SSP  I  - 
SSP  III.  In  the  first  place,  a  solution  may  not  exist;  i.e.,  any  infi- 
mum  may  be  taken  on  at  a  point  t^  where  x^  +  s^t^  is  on  the  boundary 

of  H  .  Any  algorithm  which  uses  these  step  size  procedures  is  required 
to  ascertain  that  a  solution  exists.  (If  it  does,  it  is  obviously  un¬ 
constrained.  )  Second,  even  if  solutions  to  SSP  1  and  SSP  II  are  known 
to  exist,  there  is  no  guarantee  usually  that  the  desired  t,  can  be 

K 

found  when  the  function  f(x)  is  a  general  not-necessari ly-convex  func¬ 
tion.  A  method  for  solving  SSP  1  and  SSP  II  is  described  in  | McCormick, 
1979|. 

When,  for  fixed  x^s^  the  function  f 1 +  s^t)  is  convex  in 

t  ,  all  three  of  the  first  step  size  procedures  reduce  to  the  same  prob¬ 
lem. 


Most  of  the  published  convergence  theorems  for  algorithms  solving 
the  uncon8trai  led  optimization  problem  rely  on  either  SSP  I  or  SSP  II. 
Tht  guaranteed  ability  to  solve  SSP  III  has  not  proved  a  strong  enough 
tool  for  proving  these  theorems. 

A  third  difficulty  which  has  concerned  algor ithmists  in  recent 
years  is  that  these  step  size  procedures  are  idealized  in  that  it  is 

-  3  - 
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usually  impossible  to  find  a  local  minimi zer  cxai-t.  /;. .  Convergence 
proofs  and  rate  of  convergence  proofs  relying  on  exact  minimization  are 
thus  suspect.  Some  effort  has  been  put  forward  in  stating  weaker  re¬ 
quirements  on  the  three  optimal  step  size  procedures.  This  amounts  es¬ 
sentially  to  deciding  how  far  from  exact  minimization  one  can  come  and 
still  prove  the  desired  theorems.  The  reader  is  referred  to  [Polak, 
1971],  [Cohen,  1972],  and  [McCormick  and  Ritter,  1974]  for  more  on  this 
subject . 

When  f(x)  is  strictly  convex,  it  is  easy  to  show  that  the  t^ 

specified  by  the  optimal  step  size  procedures  is  the  same  and  that  an 
algorithm  can  be  specified  to  guarantee  finding  the  value  (if  it 
exists).  For  methods  of  conjugate  directions,  the  optimal  step  size 
procedure  is  necessary  in  accelerating  the  rate  at  which  the  algorithm 
converges  to  the  global  minimizer.  Many  unconstrained  minimization  al¬ 
gorithms  (e.g.,  Newton's  method,  and  some  quasi-Newton  methods)  do  not  de 
pend  upon  optimal  step  size  procedures  for  their  rate  of  convergence  prop 
erties.  For  these  it  is  possible  to  use  step  size  procedures  of  a  type 
introduced  by  [Armijo,  1966[.  Some  generalizations  of  his  approach  are 
described  below.  As  before,  assume  that  xk  i  H  ♦  a  given  open  set.  Let 

s^.d^  be  directions,  where  s^  is  one  of  nonascent  and  d^  is  one  of 

nonascent  and  also  nonpositive  curvature.  Let  0 <  i  <  1  be  »  pre¬ 
assigned  constant. 

FIRST  ORDER  ARMIJO  (SSP  IV): 

“  i  (k ) 

Set  t^  =  2  ,  where  i(k)  is  the  smallest  integer  from 

i=0,i,...  such  that 

xk  +  sk2_i  t  H  , 
and 

fUk+Sk2"1]  -  f(*k)  v  u2~lskVf(xk)  . 

In  order  for  a  finite  value  of  i  satisfying  the  inequality  to 

exist,  it  is  sufficient  that  s,  be  a  direction  of  descent.  When 

k 

-  4  - 
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certain  conditions  on  the  sequence  (s^)  are  met,  it  is  possible  to 

prove  that  points  of  accumulation  of  the  sequence  {xk}  are  stationary 

points.  When  variations  of  Newton's  method  are  used  which  involve  a 
direction  of  nonpositive  curvature,  it  is  possible  to  prove  convergence 
(see  [McCormick  1977]  and  Theorem  4)  to  a  second  order  point  when  the 
following  step  size  procedure  is  used. 

SECOND  ORDER  ARMIJO  (SSP  V): 

-i(k) 

Set  t^  =  2  ,  where  i(k)  is  the  smallest  integer  from 

i=0, 1, . . .  such  that 

yk(2_1)  =  xR  +  sk2_1  +  d  2-1^2  c  H  » 

and 

Uyk<2-‘)!  -  f(xk)  <  «[skVf<V  ♦  \  <£v2f(*k)dk]  2-1  . 

In  order  for  a  finite  value  i(k)  to  exist,  it  is  sufficient  that 
sk,dR  be  nonascent  directions  and  in  addition,  that  skVf(xk>  <  0 

whenever  Vf(xk>  ^  0  ,  and 

4v2f(xk)dk  <  0 

whenever  Vf(xk>  =  0  . 


3.  First  Order  Convergence 


Theorem  Z.  Consider  any  algorithm  for  minimizing  the  continuous¬ 
ly  differentiable  function  f(x)  in  the  open  set  H  which  has  the  fol¬ 
lowing  properties:  the  algorithm  is  a  nonascent  algoritlim,  i.e., 
f (xk+1)  <  f(xk)  for  all  k  ;  consecutive  points  are  of  the  form  xk+i  = 
T 

xk  +  «ktk  where  skVf(xk>  <  0  ;  and  tk  is  found  by  solving  the  step- 
size  problem  SSP  I.  Let  xeH  be  a  point  of  accumulation  of  { xk }  and 


a  set  of  indices  such  that  lira^^  xk  *  x  .  Assume  that 
for  all  k t  Kj  .  Let  s  be  any  point  of  accumulation  of  { sk } 


Kll  <  M 

for 


-  5  - 
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k  t:  Kj  .  Then 


s  Vf (x)  =  0 


Proof:  Let  C  be  a  set  of  indices  such  that  s  = 
lim,  s.  .  if  s  *  0  ,  the  theorem  is  obviously  true.  Assume  other- 

K 


Case  (i).  There  exists  a  set  of  indices  C  ^  such  that 
^imkeK.  tk  =  *  Because  c^e  optimal  step  size  problem  generates  t^_  , 

0  =  Vf(xk  +  sktk)T  sk  . 

Taking  the  limit  (by  assumption  each  ||s.  |J  is  uniformly  bounded  above 


and  tR  ■>  0)  , 


0  =  Vf(x)T  s  . 


Case  (ii).  Here,  liminfk£_K  t^  =  t  >  0  .  Let  K.^  C  K2  be  a 


set  of  indices  such  that  t,  >  t/2  for  all  kc  K  .  Assume  the  con¬ 
ic  4 

-  T- 

trary  of  the  theorem  conclusion,  i.e.,  Vf(x)  s  <  -5  <  0  .  Then  there 
is  a  neighborhood  N(x)  about  x  and  a  set  of  indices  K,_  C  such 

that  for  x  l  N(x)  and  k  £  K<.  ,  Vf(x)^sk  <  -5/2  <  0  . 

Let  t  >  0  be  a  scalar  small  enough  such  that  for  all  0  <  t  < 


t  ,  all  kcKj  , 


xk  +  skt  c  N (x)  . 


Hick  t*  =  min(t/2  ,  t)  .  Then 


hppjpw.i! 1  fii!.  w  iiyiJ.'Ji  iwj 


T-431 


f(x) 


-  l  If(x  )-f(x  )]<  l 
k=0  k£K5 

<  l  |f(x  +S  t*)-f(x  )] 
kcK5  k 


ff (xfc+1)  -  f (x^) ]  (the  nonas¬ 
cent  prop¬ 
erty) 

(because  of  the  step-  (2) 
size  procedure) 


r  T 

=  l  Vf(x,  +s,x)  s,  t*  (Taylor's  Theorem  with  0£T£t*) 
keK  k  k  k 

<  l  - (6/2) t*  =  -«  . 

keK5 

This  contradiction  shows  the  truth  of  the  theorem  for  Case  (ii) . 


Theorem  P,  (Convergence  of  SSP  II]:  Consider  the  same  hypotheses 
and  statements  as  in  Theorem  1  except  that  the  step  size  procedure  SSP 
II  is  used.  Then  the  same  conclusions  hold. 

Proof:  Nothing  in  the  proof  of  Theorem  1  changes  except  it  is 
noted  that  Inequality  (2)  follows  when  SSP  II  is  used  because  t^  is 

chosen  as  a  global  minimizer  in  H  of  £(x)  along  the  vector  s^  ema¬ 
nating  from  . 

Comment:  Unfortunately,  the  same  proof  cannot  be  used  for  SSP 
IT~ .  It  is  theoretically  possible  that  if  t^  is  chosen  to  be  any  lo¬ 
cal  minimum  | albeit  with  value  less  than  f(x^)  )>  Inequality  (2)  may 
not  be  valid. 

Under  similar  assumptions,  convergence  to  a  stationary  point  can 
be  shown  if  the  Armijo  step  size  procedure  SSP  IV  is  used. 

Theorem  3:  Suppose  SSP  IV  is  used  as  the  step  size  procedure. 

Let  x  e  H  be  a  point  of  accumulation  of  {x^}  and  a  set  of  indi¬ 

ces  such  that  x  x  for  k£K  .  Assume  that  f  r  C*  ,  that  there 

K.  X 

exists  a  value  B>0  such  that 


-  7  - 


-  -i*»i 
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|s,  ||  >  B||f'(x  ) 


for  all  k  t.  K  ^  ,  that  f'(xk)s^  <  0  for  each  k  ,  that  the  sequence 
{s  i  is  uniformly  bounded  (k  t:  K  )  ,  and  that 

•V  1 


limSupkrK1  f'(xk)sk  '  “V 


-6  <  0  . 


f'(x)  =  0  . 


I’voof:  There  are  two  cases  to  consider. 


Case  (i).  The  set  of  indices  I i(k)}  for  kt:  is  uniformly 


bounded  above  (by  a  number  I).  Because  the  sequence  {f(x, )}  is  mono¬ 


tone  decreasing  and  each  f'(x,  )s,  <  0  ,  it  follows  by  summing  appro¬ 


priately  that 


fOO  -  f(xn)  <  ):  a2“i(k)  f '  (x  )s 


u2~l(k)  f*(xk)8k 


Further,  it  follows  from  (3),  (4),  an<!  the  assumed  uniform  bound  on 


i(k)  that 


2'i(‘°  f’(x  )s  <  -«602"T  ||f'(x  )||2  , 


for  each  k c  .  It  follows  directly,  then,  that  f'(xk>  0  for 


kf.  Kj  ,  and  by  the  continuity  of  f ’  (x)  that  f'(x)  =  0  . 


Case  (ii).  There  is  a  subset  of  indices  K9  C  such  that 


1 im  i(k)  =  <«  .  If  the  cause  for  termination  of  iteration  k  were 
that  x^  +  s^2  ^k^+^  t  H  infinitely  often,  then  since  i(k)  ->  for 


kr.l^  ,  and  because  the  {s^}  are  uniformly  bounded,  it  follows  that  x 


is  on  the  boundary  of  H  .  Since  H  is  an  open  set,  x  t  H  ,  a  contra¬ 
diction  to  the  theorem  assumption.  Thus,  without  loss  of  generality, 
assume  for  all  k r  , 


-  8  - 


I 


f,(xk)sk2~i(k)+1  +  °(!lskll2"i(k)+1) 


=  fK+sk2 


-i(k)+l 


)  -  f(*k> 


'  a2_1(k>+1  f'(xk)sk 


Transposing,  using  (4)  and  dividing  by  !(SJI  2  yields 


o(||sk||2-i(k)+1)  /  2-1(k)+1!|ski|  >  («-l)f'(xk)sk  /  !|ski| 


>  (l-a)||f*(xk)||  . 

Because  the  {sk}  arc  uniformly  bounded,  taking  the  limit  as  k  -*■  00 
for  k t  K  yields  the  desired  result. 


4.  Second  Order  Convergence 

Some  algorithms  are  concerned  with  producing  accumulation  points 
which  have  in  addition  to  the  stationarity  property  the  property  that 
their  Hessian  matrices  are  positive  semidef inite.  So  far  only  algo¬ 
rithms  which  compute  explicit  second  derivative  information  have  been 
modified  to  produce  this  kind  of  convergence.  It  is  theoretically  pos¬ 
sible  that  by  using  a  finite  difference  approximation  technique,  similar 
convergence  results  could  be  obtained. 

Algorithms  which  produce  second  order  convergence  must  check  the 
Hessian  matrix  at  each  iteration.  If  it  is  indefinite,  a  nonascent  di¬ 
rection  of  nonpositive  curvature  must  be  computed  as  well  as  a  nonascent 
direction.  A  convenient  step  size  rule  to  use,  then,  is  SSP  V. 

If  the  vector  sk  forms  a  sufficiently  small  angle  with  the 

negative  gradient  vector,  and  if  the  direction  of  nonpositive  curvature 
acts  sufficiently  like  an  eigenvector  associated  with  the  minimum  eigen¬ 
value,  then  an  interesting  convergence  theorem  can  be  proved. 
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Th'ovem  -1  (Second  Order  Convergence  of  SSP  V  [McCormick  1979]): 

Assam*.'  tli.it  f  ;  C^  in  tl  .  Suppose  that  in  minimizing  f  in  H  an 
algorithm  with  the  descent  property  uses  SSI*  V  an  infinite  number  of 
times.  | The  d>\wnl  /> no/ >.  ■)*';/  is  simplv  that  <  f(xk)  for 

k  .)  Let  be  the  infinite  set  of  indices  for  which  SSP  V  is  used, 

and  let  s  denote  a  point  of  accumulation  in  H  of  {x  }  for  kcK  . 

K  J. 

Let  C  K  he  a  set  of  indices  such  that  lim,  „  x,  =  x  .  Some  reg- 
ularity  properties  on  the  sequences  (s  i  and  {d,  }  are  required  to 

K.  K 

hold.  There  exists  a  value  0>O  such  that 

l|skll  >  3||f’(xk)||  ,  for  all  k£K2  .  (5) 

There  is  a  value  6 >  0  such  that 

f’<xk)  sk 

'“"““"k.  k2  ■  -*  ‘ 11  •  <6) 

There  is  a  value  y  > 0  such  that 

<f"(xk)dk  <  (‘■‘kin)  f"(xk)ekiny  »  for  a11  k'  K9  *  (7) 

where  e™111  is  an  eigenvector  of  f"(xk)  associated  witli  its  minimum 
eigenvalue.  The  sequences  {s^  and  {dk}  are  uniformly  bounded. 

Then:  x  is  a  stationary  point,  i.e., 

f'(x)  =  0  , 

and 

f"(x) 

is  a  positive  semidefinite  matrix  with  at  least  one  eigenvalue  equal  to 
zero. 

/V< H) ]':  There  are  two  cases  to  consider. 

Case  (i).  The  integers  f  i  (k) }  for  k  in  K.}  are  uniformly 
bounded  above  by  some  value  I  .  Because  of  the  descent  property,  it 
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follows  that  all  points  of  accumulation  have  the  same  function  value. 


V  f(xk+1)  -  f(xfc) 
k>  k 


a  V  2' 
k,  K.. 


“k>  [|,"»|vKf"h1,dJ 

1  ’<xk)  |!-  + . 


Since  f(x)  is  finite,  and  since  each  term  in  brackets  is  less  than  or 
equal  to  zero  for  each  ki  K.;  ,  it  follows  that  f  x)  =  0  ,  and  that 

c1.  I  "(x)  c  .  =  0  ,  where  e  .  is  some  acciimn  I  at  ion  point  of  {of  nl 
min  nun  mm  k 

for  k  i  k.j  . 


i'ase  (ii).  There  is  a  subset  C  K,,  such  that 
I im.  i ( k )  -  ■  .  because  of  the  definition  of  i(k)  ,  then  either 

K 

yk(2~i(k)+1)  t  H  , 
or 

f|yk(i>“'(k)  +  l)|  -  f(xR)  >  a2"‘(K)  +  1  |f(xk)sk+  ‘  d'kf’(xk)dkJ  .  (8) 

If  the  former  condition  held  Infinitely  often,  then  because 

v ^ ( 2  x  ,  also  (k  t  K^)  it  follows  that  x  is  on  the  boundary 

of  II  .  Since  H  is  an  open  set,  x  ^  11  ,  a  contradiction  to  the  the¬ 
orem  hypothesis.  There  fore ,  without  loss  of  generality,  (8)  can  be  con¬ 
sidered  to  hold  for  all  ks  . 

2 

because  fi  il“  and  because  the  sequences  {skl  and  {dk}  are 

assumed  to  be  uniformly  bounded,  the  left-hand  side  of  the  inequality 
(8)  can  he  writ  ten 
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f,(xk)sk2_i(k)+1  +  f,(Vdk2~li(k)~1J/2 


lIV* 


-i(k)+l  +  d  2-Ii(k)-l]/2 
k 


]  f,,(xk)[sk2"1(k)+1  +  dk2~[i(k)~11/2] 


+  o(2-i(k)+1)  . 


Combining  like  terms  and  incorporating  where  appropriate  into 
o(2  yields  | using  the  fact  that  f'(x^)dk  <  0  ], 

o(2-I(t,+1)  >  (^l)[f(xk)»k  +  i<f"(VdJ2'i<k)+1 

>  (a-l)[-6B||f'(xk)||2  +|(e”in)T  f"(xk)e"lnY]2‘‘ 


min  -i(k)+l 


| using  (5),  (6),  and  (7)].  Dividing  by  2  ,  taking  the  limit  as 

k  -> ">  (for  k l  K^)  yields,  by  the  argument  in  Case  (i),  the  desired 
result . 

A  different  strategy  for  minimizing  a  function  whose  Hessian  is 
not  always  positive  definite  is  to  compute  a  direction  of  nonpositive  or 
negative  curvature  and  optimize  in  that  direction.  The  theorem  below  is 
useful  for  proving  convergence  of  algorithms  using  this  strategy. 

Theorem  b:  Assume  as  in  Theorem  1,  and  in  addition  assume  that 
f(x)  is  twice  continuously  differentiable  in  H  .  Then  in  addition  to 


the  conclusion  of  Theorem  1, 


aTV2f(x)s  >  0  . 


1‘roof:  Let  K„  C  k,  be  a  set  of  indices  such  that  s  = 

J  2  1 

lim  s,  .  If  s  =  0  the  theorem  is  obviously  true.  Assume  otherwise. 

kf'K^  k 

Case  (i).  There  exists  a  set  of  indices  C  such  that 

lim  t  =  0  .  Because  the  optimal  step  size  procedure  generates  t  , 

Kt  K  ^ 
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SkV‘f(xk+8ktk)“k  ’  °  * 


I. living  1 1  »*  •  limit  vicIJs  tin*  desired  result  tot  case  i  i  1 


Case  (ii).  Here,  liminl,  t,  =  t  0  .  Lei  K.  C  1  he  a 

k  >  k  k  t. 

set  of  indices  such  that  t.  t/2  for  all  k  <  K,  .  Assume  the  con¬ 
ic  4 

trary  of  the  theorem  conclusion,  i.e.,  that  s'v"f(x)s  =  -  ■  0  .  Then 
there  is  a  neighborhood  N(x)  about  x  and  a  set  ol  indices  kr  C  K 

such  that  for  x  >  N(x)  ,  and  k  t  K,  ,  s.lV^f(x)s.  s  -'>/!  <  0 

5  k  k 

let  t  0  be  a  scalar  small  enough  such  that  for  all  0  t  •  t  , 


all  kt  , 


+  s^t  t  N(x)  , 


Pick  t*  =  min(t/2  ,  t)  .  Then 


f(x)  - 


r(V 


)  |f(x  )-f(x  )|  v  )  |  f(xk+]  1  -  f  (xk>  I  (the  noi- 

k=()  ki  ascent 

’  properly) 


)  |f(x  +s  t*)  -  f  (x  )  J  (because  of  the  step  site 
ki  procedure) 


>,  [vt<vTv* +  “k,'2f<\,*k,)“k<‘*r'/-] 

ki  K,  *•  t 


(lav ) or ' s 
theorem  with 
U  •  .  •  t  *  ) 


<  )  s^V^f  (x^+ s^i  )sk(t*)'  /2  (fact  tint  s^Vf  (x^)  -  >1) 

ki.K^ 


)  -(6/2)(t*r/2 

ki  K, 


This  contradiction  proves  the  theorem  for  Case-  (ii). 
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5 .  Appl icat ions 

The  well-known  method  of  steepest  descent  chooses  as  the  direc¬ 
tion  of  search  each  iteration  the  negative  grad  ion.  vector,  i.e.,  = 

-Vf(x,  )  .  That  points  o!  accumulation  generated  nv  this  method 

are  stationary  points  follows  directly  from  Theorem  1.  The  conclusion 

-T  -  •> 

is  that  s  Vf(x)  *  0  ,  whicn  means  that  .  t  ( x ) j ,  *  0  . 

It  is  easy  to  cite  other  applications  of  these  theorems  to  prove 
convergence  of  algorithms.  They  essentially  shift  the  burden  of  proof 
to  that  of  showing  that  the  hypotheses  of  the  theorems  are  as  satisfied, 
in  many  cases  this  is  difficult.  The  application  to  be  pursued  here  is 
the  modified  Newton  method.  This  type  of  algorithm  (see  (McCormick 
I97h|  lor  a  survey  of  these  methods)  is  one  which  modifies  the  classical 
Newton  procedure  when  a  point  is  encountered  where  the  Hessian  matrix  is 
not  positive  definite.  flu*  modification  considered  here  is  one  which 
uses  a  optimal  steps  at  each  iteration.  It  is  not  recommended  since 
it  is  computationally  prohioit ive  but  is  considered  to  illustrate  t he 
application  of  the  general  theorems. 


T  .2 

bet  l\(  =  ^c  a”  eigenvalue-eigenvector  reduction 

of  the  Hessian  matrix  at  ,  i.e.,  E^fE^)1  =  i  ,  and  is  a  diago- 

K  K 

nal  matrix.  Let  e  bo  the  j th  column  of  K,  .  Set  y  *  x  .  In 

.1  K  J  K 

,  k 

general,  for  the  j£li  step  of  the  k ti.  iteration,  i  iad  v.  +  j  hy  solving 
the  step  size  problem  (either  SSI’  I  or  SSI’  11) 

minimize  f(y^  '  e^t) 
t  •  0  J  J 

i  k  k 

subject  to  t  t  it  |  y.  1  e.t  i  il‘  . 


k  k 

The  sign  is  chosen  so  that  '  ej  is  a  nonascent  direction  at  y 
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This  is  to  be  done  for  j*l,...,n 

taken  to  be  x.  , ,  . 

k+1 


The  last  point  generated  is 


ThtV'rem  6:  Assume  that  f(x)  is  twice  continuously  differen¬ 
tiable  in  the  open  set  H  .  Suppose  that  the  algorithm  just  described 
is  applied  to  the  unconstrained  minimization  of  f  in  H  .  Assume  that 
there  is  a  single  point  of  accumulation  (call  it  x  )  generated  by  the 
algorithm.  Then  f'(x)  *  0  ,  and  V  f(x)  is  a  positive  semidefinite 
matrix. 


I'rooj':  Let  E  be  a  matrix  of  accumulation  of  E,  with  columns 
J  k 

{ e j i  .  it  follows  from  Theorem  1  (2)  that  f'(x)e^  *  0  .  Since  by  as¬ 
sumption  {y*p  has  the  single  point  of  accumulation  x  ,  from  Theorem  1 
(2),  f'Cxje^  =  0  also.  Inductively  it  is  trivial  to  show  that 
f'(x)t>j  =  0  ,  for  j  =  l,...,n  .  Since  .he  e^'s  are  linearly  indepen¬ 
dent,  it  must  be  the  case  that  f'(x)  *  0  . 


-T  2  -  - 

Similar  reasoning  shows  that  e  V  f(x)e^  >  0  ,  for  j=l,...,n  , 

2  - 

and  thus  that  V  f(x)  is  a  positive  semidefinite  matrix.  Q.E.D. 


Practical  modified  Newton  algorithms  for  minimizing  unconstrained 
functions  differ  in  their  strategies  when  faced  with  an  indefinite  Hes¬ 
sian,  and  in  their  computation  of  the  estimates  of  the  "positive  part" 
of  the  Hessian  and  directions  of  nonpositive  curvature.  The  theorems 
presented  herein  should  be  a  help  in  proving  convergence  of  such  algo¬ 
rithms  by  isolating  the  components  of  the  proof  which  are  independent  of 
the  linear  algebra  used  to  generate  the  necessary  quantities. 
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